Collaborators: Tomo Tanaka, John Eykelenboom

Proposal

Shiny data explorers

Proposal

This is continuation of the previous Chromatin Compaction Modelling project. John is working on new data, but this time we have full 3D co-ordinates of each dot from the tracking software (which one?). The first step is to convert these co-ordinates into cell states, denoted by a colour.

Description by John

Raw data for cells is in form of Excel files. There are 3 files per cell

  • the whole track length [fr x - z] = frames x to z
  • up to NEBD [fr x - y]
  • after NEBD [fr y+1 – z]

In the case of the first file type there is no assignment of NEBD so these cells will not be automatically aligned according to our previous work. Maybe the second two file types are better; the two halves can be stitched together and then different cells aligned with each other according to the join (let me know what you think if this will work well).

When I tracked the dots the objects are organised into “tracks” that always refer to either red or green (not mixed). The colour for a given track could be determined by looking at the intensities in ch.1 (red) or ch.2 (green) for the objects of the track (and compared to the intensities for the same channel in the other tracks). I could also assign the colours on the excel sheet if this would be easier (e.g. manually make a new tab with the info). [information for channel 3 and 4 are simply masked versions of 1 and 2 that I use for tracking more easily].

The rules and colour coding we worked with in our previous study (distances are approx.):

  • Light Blue = 2 dots (not overlapping e.g. > 0.3 µm)
  • Dark blue = 3 dots (where dots of the same colour have distance < 0.75 µm)
  • Brown = 3 dots (where dots of the same colour have distance > 0.75 µm)
  • Pink = 4 dots (no restriction on distance)
  • Red = 4 dots (with 2 pairs of red/green overlapping e.g. < 0.3 µm)

Later addition:

  • Black = 2 dots (overlapping e.g. < 0.3 µm)

Colour identification

Objects are organised into “tracks”, that always refer to one colour, red or green. The software outputs intensities measured in two channels: red and green. Initially, I thought I could simply decide the colour of a dot based on which intensity is larger, red or green. However, some tracks have dots where green > red or red > green at different time points. See this example, made for cell_1. Tracks 1000000000 and 1000000101 have intensities on both sides of red=green line.

For now, we use manual annotation, that is a track with given ID has a colour assigned manually by John.

Intensity difference

Intensity in red and green channels can fluctuate and even a green dot can sometimes become a bit red. However, when we have two or more dots, it might be easier to assign colours by comparing them in each frame.

The figure below shows the intensity difference (green - red) for each dot at each time point for cell 1. The letters at the bottom indicate the state (L-light blue, B-dark blue, K-black, W-brown, P-pink, R-red). The digits show the number of points in the XYZ data. Bold font indicates missing points in intensity data.

Now we see that in most cases green dots have higher green-red difference than red dots, as expected. This holds true for frames around time -8 to -3 min, where red dots have positive green-red: green dots are always greener.

There are a few issues with this. They are marked with grey boxes. There are a few types of issues and I discuss them here, by looking at raw data.

-31 min

There is only one green dot in the figure. Here is some raw data from this frame.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1805.33 607.38 30.36 µ Spot Position 6 1000000047 52
-1805.52 607.47 30.52 µ Spot Position 6 1000000143 193
Intensity Max Ch=1 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
7487 NA Spot 1 Image 1 6 1000000047 52
Intensity Max Ch=2 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
12923 NA Spot 2 Image 1 6 1000000047 52

There are two dots in the Position sheet, but only one in intensity sheets, for red and green channels. Intensity data for track 1000000143 is missing in both channels.

-13 min

Here we have two red dots in the plot.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1803.68 604.43 29.02 µ Spot Position 24 1000000047 70
-1804.14 604.45 28.86 µ Spot Position 24 1000000143 172
-1803.38 603.94 29.23 µ Spot Position 24 1000000143 173
Intensity Max Ch=1 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
9083 NA Spot 1 Image 1 24 1000000143 172
8027 NA Spot 1 Image 1 24 1000000143 173
Intensity Max Ch=2 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
4949 NA Spot 2 Image 1 24 1000000143 172
7139 NA Spot 2 Image 1 24 1000000143 173

Just like above, intensity data from track 1000000047 is missing. There are two dots with measured position, byt only one of them has measured intensities.

+6 min

This is an interesting case.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1799.16 605.16 32.66 µ Spot Position 43 1000000000 6
-1798.56 605.17 32.73 µ Spot Position 43 1000000000 7
-1798.51 605.17 32.68 µ Spot Position 43 1000000088 98
-1799.02 605.13 32.23 µ Spot Position 43 1000000088 99
Intensity Max Ch=1 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
5578 NA Spot 1 Image 1 43 1000000000 6
5079 NA Spot 1 Image 1 43 1000000000 7
5079 NA Spot 1 Image 1 43 1000000088 98
4254 NA Spot 1 Image 1 43 1000000088 99
Intensity Max Ch=2 Img=1
Intensity Max Unit Category Channel Image Time TrackID ID
5877 NA Spot 2 Image 1 43 1000000000 6
8890 NA Spot 2 Image 1 43 1000000000 7
8890 NA Spot 2 Image 1 43 1000000088 98
7052 NA Spot 2 Image 1 43 1000000088 99

There are four dots in position and intensity, but intensities of dots in rows 2 and 3 are identical, in each channel. It looks like maybe a dot intensity was missing, but it was replaced with an intensity from another dot. As a result, we have a red dot with green intensity way too large than it should be.

State identification

I follow rules as outline in the proposal. They are fairly straightforward, except for the case of four dots.

From John:

In the cell each green dot is linked directly to one red dot (they are on the same sister chromosome) and likewise the same green dot is not linked to one red dot. Unfortunately we cannot say unambiguously which green should match with which red as we have no way to distinguish (or our analysis so far has not been so sophisticated). In our JCB paper (you can look at Figure S1C-F) I did some measurements in a small batch of data. As you can see when I was plotting distances between red and green dots (part E) I obtained all four possible distances and put them into the two possible combinations (a & b or c & d). For the analysis and plotting I just took the shortest combined pairs of distances between red and green dots – in the cartoon this is a and b (and discard distances c and d). For red pattern to be true, then a and b should both be less than 0.3-0.4µm (otherwise the pattern is deemed pink).

I follow this approach. I calculate distances for the two possible combination of red and green dots, and then select the combination with smaller mean distance. The state is deemed red when both distances in this combination are less than the limit.

This cartoon figure defines distances a, b, r and g.

Tracking cells

This figure demonstrates how state tracking works. The shapes indicate number of dots detected, the colour indicates the state. The horizontal dashed lines show the distance limits applied.

This is a more traditional look at the data:

What might be of interest, is the distribution of distances between dots. The next figure shows the aggregated distribution across all cells, for two, three and four dots detected. In case of four dots both distances are plotted, hence there are twice as many data points than images. Also, there are pink states below the limit (0.4 µm) - these are the cases where only one distance is below the limit.

Another plot showing all distances (a, b, r and g):

Dendrogam

Red-red/green-green angle

Angle distribution between red-red and green-green vectors - only for cases with four dots.

Here is a timeline of angles, divided into windows of 10 min.

The next figure shows the relation between maximum of the distances a and b, and the angle.

The same figure, but split into time windows.

What if the large red-green angles at small a/b distances are a result of increased error when r/g distances are also small? This figure shows the red-green angle as a function of the smallest of the r, g distances. It only contains data below the red-pink limit (the vertical dashed line in figures above).

Indeed, most of the excessive angles appear when r or g are small. Perhaps, with brown-pink limit set to 0.5, these high angles can be attributed to brown.

Signal-to-noise

We define signal as the maximum dot intensity and noise as the mean across extended volume. I match colours here, that is for the red dot I use extended volume in the red channel, and the same for green. The figures below show the intensity (left) for the extended volume background (black) and a corresponding dot intensity (red and green). The right panels show signal-to-noise, with a smooth line fitted for guidance.

no_siRNA

NCAPD2_siRNA

NCAPD3_siRNA

All data combined

The following figure contains all data combined. It can be used to judge where to make a low S/N cut.

Mean, sum intensity and volume

I am trying to understand what the quantities provided in Excel sheets mean. I assume that \(Sum\) is the total intensity across the volume \(Volume\). The mean intensity should be

\[Mean = \frac{Sum}{N},\]

where \(N\) is the number of pixels. It is tempting to assume that \(N\) is directly proportional to \(Volume\), but is that so? The figures below show data from dots (left) and extended volume (right). On the horizontal axis is the \(Mean\), on the vertical axis is \(Sum/Volume\). If our assumptions are correct, we should see a straight line.

It seems that \(Volume \propto N\) is true for extended volume. However, the picture in the dots is more complicated. Have a look at one particular cell. This time I plot \(Sum\) versus \(Mean\):

I can see 8 separate straight lines, suggesting 8 distinct values of \(N\). However, there are only two \(Volume\) values here and they don’t seem to correlate with the slope. Therefore, the equation \(Mean = Sum/Volume\) does not work.

We can look at this from the other end. If \(Mean = Sum/N\), we cen reconstruct \(N = Sum/Mean\). Here it is for all our cells:

Indeed, dividing \(Sum\) by \(Mean\) returns an integer number, typically around 10, but for the last cell, there is a group of ns between 100 and 1000. I don’t think these are pixels, but whatever they are they are definitely whole numbers.

Voxels and volume

E-mail from John:

Along these lines I tried to explore how changing volume size of the dot would affect the “fluorescent values” in the sheets. I took 10 points at the beginning of one movie and identified dots in the first 10 frames (define xyz) in my usual way and I took the statistics for this original one ["..with background subtraction (normal diameter).xls]. I then duplicated the objects and manipulated the dot sizes (increased diameters by a factor of 2, decreased by a factor of 5, and used 1x or 2x pixel size dimensions (see above paragraph)) and each time extracted new statistics (note that xyz positions are identical between the sheets). You can see that when the pixel size gets smaller it more frequently fails to return a fluorescence value (though, even in the smallest case where the object is just 0.0010µm3 it still returns values for two objects suggesting it is more than just defining fluorescence values from completely submerged voxels). Maybe you can see how the denominator looks in each of these cases?

Finally, I took statistics for the same objects which were defined separately without background subtraction applied. The fluorescent values seem broadly to be the same, so my guess is that there is no correction of the values in the reported sheets – rather the object is identified using an algorithm that includes (or doesn’t) background subtraction and then the statistics report the raw fluorescent values. Note how the xyz positions are slightly different between these two sheets which probably accounts for the minor differences seen.

I took these data and put them all together in one figure. I normalised all data to the “normal” set with background subtraction. The vertical axis show the ratio in the logarithmic scale.

Session info

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.4 forcats_0.5.1    stringr_1.4.0    dplyr_1.0.7      purrr_0.3.4     
##  [6] readr_2.0.0      tidyr_1.1.3      tibble_3.1.3     ggplot2_3.3.5    tidyverse_1.3.1 
## [11] targets_0.6.0   
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-152        fs_1.5.0            lubridate_1.7.10    webshot_0.5.2      
##  [5] httr_1.4.2          tools_4.1.1         backports_1.2.1     bslib_0.2.5.1      
##  [9] utf8_1.2.2          R6_2.5.1            vipor_0.4.5         DBI_1.1.1          
## [13] mgcv_1.8-36         colorspace_2.0-2    withr_2.4.2         tidyselect_1.1.1   
## [17] processx_3.5.2      compiler_4.1.1      cli_3.0.1           rvest_1.0.1        
## [21] xml2_1.3.2          labeling_0.4.2      stringfish_0.15.2   sass_0.4.0         
## [25] scales_1.1.1        callr_3.7.0         systemfonts_1.0.2   digest_0.6.27      
## [29] rmarkdown_2.10      svglite_2.0.0       pkgconfig_2.0.3     htmltools_0.5.2    
## [33] dbplyr_2.1.1        fastmap_1.1.0       highr_0.9           rlang_0.4.11       
## [37] readxl_1.3.1        rstudioapi_0.13     jquerylib_0.1.4     farver_2.1.0       
## [41] generics_0.1.0      RApiSerialize_0.1.0 jsonlite_1.7.2      magrittr_2.0.1     
## [45] Matrix_1.3-4        Rcpp_1.0.7          ggbeeswarm_0.6.0    munsell_0.5.0      
## [49] fansi_0.5.0         lifecycle_1.0.0     stringi_1.7.4       yaml_2.2.1         
## [53] grid_4.1.1          crayon_1.4.1        lattice_0.20-44     haven_2.4.2        
## [57] cowplot_1.1.1       splines_4.1.1       hms_1.1.0           knitr_1.33         
## [61] ps_1.6.0            pillar_1.6.2        igraph_1.2.6        codetools_0.2-18   
## [65] reprex_2.0.0        glue_1.4.2          evaluate_0.14       data.table_1.14.0  
## [69] RcppParallel_5.1.4  modelr_0.1.8        vctrs_0.3.8         png_0.1-7          
## [73] tzdb_0.1.2          cellranger_1.1.0    gtable_0.3.0        qs_0.25.1          
## [77] assertthat_0.2.1    xfun_0.25           broom_0.7.9         viridisLite_0.4.0  
## [81] beeswarm_0.4.0      ellipsis_0.3.2